Statistics

Statistics: All Roads Lead to Hypothesis Testing

Whether you are testing a coin, a correlation, or a variance, every hypothesis test asks exactly the same question. Once you see this, the subject stops feeling like a collection of unrelated procedures and becomes one repeating idea.

The unifying idea — every time

Assume the null hypothesis H₀ is true. Ask: how probable is it that we would see data at least this extreme? If that probability is small enough, we have evidence against H₀.

For a two-tailed test at significance level α, split the region equally: compare each tail to α/2. For tests that use a critical value from tables (PMCC, Spearman, Wilcoxon), the table already encodes this threshold — the underlying logic is identical.

1. Binomial distribution

A-level MathsEdexcelAQAOCROCR MEI

Used when counting the number of successes in a fixed number of independent trials, each with the same probability of success. The classic example is testing whether a coin (or process) is biased.

Setup

Under H₀, the test statistic X (number of successes) follows:

Decision rule

Worked example

A die is suspected of showing six too often. In 30 rolls, six appears 9 times. Test at the 5% significance level.

There is sufficient evidence at the 5% level that the die is biased towards six.

Try it yourself

A coin is tossed 20 times and lands heads 14 times. Test at the 5% level whether the probability of heads exceeds 0.5.

Show answer

There is insufficient evidence at the 5% level that the coin is biased towards heads.

2. Normal distribution (z-test)

A-level MathsEdexcelAQAOCROCR MEI

Used to test the mean of a normally distributed population when the population variance σ² is known. The sample mean X̄ is itself normally distributed.

Setup

Under H₀:

Decision rule

Worked example

The lengths of bolts are normally distributed with σ = 1.2 mm. The target mean is μ = 25 mm. A sample of 36 bolts has mean 25.5 mm. Is there evidence at 1% that the mean has increased?

There is sufficient evidence at the 1% level that the mean length has increased.

Try it yourself

Packages are claimed to weigh μ = 500 g with known σ = 10 g. A sample of 25 packages gives x̄ = 496 g. Test at 5% (two-tailed) whether the mean weight has changed.

Show answer

Sufficient evidence at the 5% level that the mean weight has changed.

3. Product moment correlation coefficient (PMCC)

A-level MathsEdexcelAQAOCROCR MEI

Tests whether there is linear correlation between two variables in a bivariate normal population. The null hypothesis is always that the population correlation ρ is zero.

Setup

The test statistic is the sample PMCC r. The critical values of r are read from tables for the given n and significance level; they implicitly encode P(R ≥ r | ρ = 0) = α.

Decision rule

Worked example

For n = 10 data pairs, r = 0.648. Is there evidence of positive correlation at 5%?

There is sufficient evidence at the 5% level of positive correlation.

Try it yourself

For n = 15 data pairs, r = 0.52. Test at 5% for positive correlation.

Show answer

Sufficient evidence at the 5% level of positive correlation.

4. Poisson distribution

A-level Further MathsEdexcel (Further Statistics 1)AQA (Statistics)OCR (Statistics)OCR MEI (Statistics)

Used when counting events that occur randomly in a fixed interval of time or space. Tests whether the underlying rate λ has changed from a known baseline.

Setup

Under H₀:

Decision rule

Worked example

A call centre receives an average of λ = 4 calls per minute. In one minute, 9 calls arrive. Test at 5% whether the rate has increased.

There is sufficient evidence at the 5% level that the call rate has increased.

Try it yourself

Faults occur at an average rate of λ = 2 per hour. In one hour, 6 faults are recorded. Test at 5% whether the rate has increased.

Show answer

Sufficient evidence at the 5% level that the fault rate has increased.

5. Geometric distribution

A-level Further MathsAQA (Statistics)OCR MEI (Statistics)

Models the number of trials needed to achieve the first success. Useful for testing whether the underlying probability of success has changed.

Setup

Under H₀, X = number of trials to first success has distribution:

A lower probability of success means we expect to wait longer, so H₁: p < p₀ corresponds to large values of X.

Decision rule

Worked example

A machine produces defective items with probability p = 0.3. An engineer suspects the fault rate has fallen. The first defective item is found on the 10th item inspected. Test at 5%.

There is sufficient evidence at the 5% level that the fault rate has decreased.

Try it yourself

A machine has a defect probability of p = 0.4. After maintenance, the first defective item appears on the 8th item inspected. Test at 5% whether the defect probability has decreased.

Show answer

Sufficient evidence at the 5% level that the defect probability has decreased.

6. Negative binomial distribution

A-level Further MathsOCR MEI (Statistics)

Extends the geometric distribution to the number of trials needed to achieve the r-th success. It is examined in OCR MEI Further Maths and uses the same hypothesis-testing framework.

Setup

Under H₀, X = number of trials to r-th success:

Decision rule

Worked example

A seed has germination probability p = 0.6. It takes 7 trials to get the 3rd germination. Test at 10% whether germination probability has decreased.

There is sufficient evidence at the 10% level that germination probability has decreased.

Try it yourself

A coin has probability p = 0.5 of heads. A suspected biased coin takes 8 flips to get the 2nd head. Test at 10% whether the probability of heads has decreased.

Show answer

P(X ≥ 8) = P(at most 1 head in first 7 flips | p = 0.5) = P(Y ≤ 1) where Y ~ B(7, 0.5):

Sufficient evidence at 10% that the probability of heads has decreased.

7. Chi-squared test

A-level Further MathsEdexcel (Further Statistics 1)AQA (Statistics)OCR (Statistics A)OCR MEI (Statistics)

The χ² test appears in two forms: goodness of fit (does data follow a proposed distribution?) and independence (are two categorical variables related?). The test statistic and decision rule are the same in both cases.

Test statistic

where O_i are observed frequencies, E_i are expected frequencies, and ν is the degrees of freedom:

Decision rule

Goodness-of-fit example

A die is rolled 60 times. Each face is expected 10 times. Observed counts are (8, 7, 12, 9, 14, 10). Test at 5% whether the die is fair.

There is insufficient evidence at the 5% level that the die is unfair.

Try it yourself

A tetrahedral die (4 faces) is rolled 40 times. Expected count per face: 10. Observed counts: (8, 12, 7, 13). Test at 5% whether the die is fair.

Show answer

Insufficient evidence at 5% that the die is unfair.

8. Spearman's rank correlation coefficient

A-level Further MathsEdexcel (Further Statistics 1)AQA (Statistics)OCR MEI (Statistics)

A non-parametric alternative to the PMCC. It tests for monotonic association between two variables by ranking the data — no normality assumption is needed.

Setup

The test statistic is:

where d_i is the difference between ranks for each pair.

Decision rule

Worked example

Seven students are ranked by two judges. The rank differences are (2, −1, 0, 1, −2, 1, −1), giving Σd² = 12. Test at 5% for positive association.

There is (just) sufficient evidence at the 5% level of positive rank association.

Try it yourself

Eight athletes are ranked by race time and by resting heart rate. The rank differences are (1, 0, −1, 2, −1, 0, 1, −2). Test at 5% for positive rank association.

Show answer

Sufficient evidence at 5% of positive rank association.

9. F distribution

A-level Further MathsEdexcel (Further Statistics 2)AQA (Statistics)OCR MEI (Statistics)

Used to compare the variances of two independent normal populations. The F statistic is the ratio of two sample variances. By convention, the larger sample variance goes in the numerator so the observed F is always ≥ 1.

Setup

Under H₀:

Decision rule

Worked example

Machine A: n₁ = 10, s₁² = 24. Machine B: n₂ = 12, s₂² = 8. Test at 10% whether variances differ.

There is sufficient evidence at the 10% level that the variances of the two machines differ.

Try it yourself

Machine A: n₁ = 8, s₁² = 25. Machine B: n₂ = 9, s₂² = 5. Test at 10% whether the variances differ.

Show answer

Sufficient evidence at 10% that the variances of the two machines differ.

10. Student's t distribution

A-level Further MathsEdexcel (Further Statistics 1)AQA (Statistics)OCR (Statistics A)OCR MEI (Statistics)

Used to test a population mean when the variance is unknown and must be estimated from the sample. This is more realistic than the z-test. A two-sample version tests whether two population means are equal.

One-sample setup

Under H₀:

Decision rule

Worked example

A fertiliser is claimed to produce a mean yield of μ = 40 kg. A trial on n = 16 plots gives x̄ = 38.2 kg, s = 3.6 kg. Test at 5%.

There is insufficient evidence at the 5% level that the mean yield differs from 40 kg.

Try it yourself

A drug claims to reduce blood pressure by μ = 15 mmHg. A trial on n = 10 patients gives x̄ = 11 mmHg, s = 6 mmHg. Test at 5% whether the true mean reduction differs from 15 mmHg.

Show answer

Insufficient evidence at 5% that the true mean reduction differs from 15 mmHg.

Two-sample t-test

To compare the means of two independent populations with unknown but equal variances (pooled t-test):

11. Wilcoxon rank-sum test (Mann–Whitney)

A-level Further MathsEdexcel (Further Statistics 2)AQA (Statistics)OCR MEI (Statistics)

A non-parametric test for comparing two independent populations — useful when data are not normally distributed. It tests whether the two populations have the same distribution (or equivalently, the same median).

Procedure

Combine all n₁ + n₂ observations and rank them from 1 to n₁ + n₂.
Sum the ranks for one group: W = sum of ranks for group 1.
Compare W to the critical values W_lower and W_upper from tables.

Decision rule

Worked example

Group A (n₁ = 4): scores 3, 7, 11, 15. Group B (n₂ = 4): scores 2, 6, 8, 14. Test at 5% (two-tailed) whether the medians differ.

Combined ranks (score → rank): 2→1, 3→2, 6→3, 7→4, 8→5, 11→6, 14→7, 15→8.

Critical values for n₁ = n₂ = 4, 5% two-tailed: W_lower = 11, W_upper = 25.

There is insufficient evidence at the 5% level that the medians differ.

Summary: the same question, eleven ways

Test	Level	Model under H₀	"More extreme" means
Binomial	Maths	X ~ B(n, p₀)	X ≥ x or X ≤ x
Normal z-test	Maths	Z ~ N(0, 1)	\|Z\| ≥ z or Z ≥ z
PMCC	Maths	ρ = 0	\|r\| ≥ r_crit
Poisson	Further	X ~ Po(λ₀)	X ≥ x or X ≤ x
Geometric	Further	X ~ Geo(p₀)	X ≤ x or X ≥ x
Negative binomial	Further	X ~ NB(r, p₀)	X ≥ x or X ≤ x
Chi-squared	Further	χ² ~ χ²(ν)	χ² ≥ χ²_obs
Spearman's rank	Further	ρ_s = 0	\|r_s\| ≥ r_s,crit
F distribution	Further	F ~ F(ν₁, ν₂)	F ≥ F_crit
Student's t	Further	T ~ t(n−1)	\|T\| ≥ t_crit
Wilcoxon rank sum	Further	same median	W ≤ W_lo or W ≥ W_hi